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Remarks 

Claims 1 — 13 ate pending m the applieasiou. Claims 1-13 are rejected. All 
reject ions ate respeei fulls traxersed. 

The invention provides a method for desem.ii.Hmg similarities of 
interpretation between portions of multimedia (videos) at a very high level 
e.g., similar action in an adventure movie, scoring opportunities in. a sports 
video, romantic activity in a gothic movie, fright in a horror movie, humor in 
a comedy movie, arid so forth. The term 'high-level' is used because the 
similarity considers a sequence of semantic events extended over a relatively 
long time period. 

Therefore, the invention segments multimedia content to extract video object 
planes, which can encode arbitrary-shaped objects according to the MPEG -4 
standard, also known as 11.264 or AVC, see Specification, page 2: 

"Newer video coding standards, such as MPEG -4. see "Information 
Technology — Generic coding of audio visual objects.'" ISO lb: C FDIS 
14496-2 (MPEG -4 Visual), Nov. .1998, allow arbitrary -shaped objects 
to be encoded and decoded as separate video object planes (VO.P)... 
The most recent standardization effort taken on by the M.PEG 
committee is that of M.P.EG-7. formally called "Multimedia Content 
Description Interlace." see "MPEG-7 Context, Objectives and 
Technical Roadmap," ISO/I.EC N2729. March 1999. Essentially, this 
standard plans to incorporate a set of descriptors and description 
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schemes that can be used to describe various types of multimedia 
content." 

In the art, these newer, high-level structures are distinguished from older, 
low-level features such as color and motion.. 

Claims 1.-1.3 are rejected under 35 U.S.C. 1.03(a) as being unpatentable over 
Yeo ei al., U.S. Patent No. 5.821.945 (Yeo), in view of Boelje et aL, U.S. 
Patent No. 6,049,332 (Boetje). 

Video object planes (VOP) are defined in the H.264/MPEG-4 or AVC 
standards. The call, for proposals was in May 1 998. and the first draft design 
for the new standard was not adopted in until 1999. The Yeo patent 
application was filed in May .1997. Yeo could not have known about video 
object planes as claimed. 

The invention segments multimedia content to extract video object planes. 
The decomposition of videos into a hierarchical scene transition, graph 
according to Yeo reflects acts, scenes and shots of the video, not video object 
planes. 

Yeo does not extract and associate features of the video object planes to 
produce content entities. Instead, the browsing process of Yeo is "automated 
to extract a hierarchical decomposition of a complex video selection in four 
steps: the identification of video shots, the clustering of video shots of 
similar visual contents, the presentation, of the content and structure to the 
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asors \ia the scene transition graph, and finally [he hierarchical organization 
of the graph structure 

Yeo does not measure high-level attnbutes of each conteiu entity Yeo states: 
"Low lex ei \ ision anaHses opetuted on \ ideo frames achiex e 
reasonably good results for the measurement of similarity (or 
dissimilarity) of diffetent shots. Similarity measures based on image 
attributes such as colot, spatial conelation and shape can distinguish 
different shots to a significant degree, even when operated on much 
reduced images as the DC images Both color and simple shape 
information are used to measure similarity of the shots " 

The problems with low-level features as in Yeo are distinguished in the 

present application at pages 2 and 3: 

"Another problem with such low-level descriptors, in general, is that a 
high-level interpretation of the object or multimedia content is 
difficult to obtain. Hence, there is a limitation in the level of 
representation. To overcome the drawbacks mentioned above and 
obtain a higher-level of representation, one may consider more 
elaborate description, schemes that combine several low-level 
descriptors. In fact, these description schemes may even contain other 
description schemes, see "MPEG-7 Description Schemes (\0.5 l" 
ISO/IEC N2844, July 1999." 
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Yeo does not describe content entities and comparing the ordered content 
entities in a piuralit) of the directed acyclic graphs to determine similar 
intetpretations ofthe multimedia com em 

Boetje does not teach comparing ordered content entities in a plurality of 
directed acyclic graphs (DAG). The Boetje system and method "creates a 
broadcast tree comprising a hierarchy of broadcast constituents, each 
constituent represented as a node in the tree," see abstract and summary. The 
hierarchy is necessary to traverse the tree in an up and down order. "Thus, to 
generate a broadcast, the tree is traversed beginning at the highest order 
constituent, and for each higher order constituent, the associations among 
tower order constituents of tire same order are evaluated to determine the 
sequence the lower order constituents are to be played." Figures 9 and 1.0 
show hierarchical trees not DAG. as claimed and known in the art. 

A DAG is a directed graph with no duected cycles, kvers directed ac\chc 
graph corresponds to a pat tial order on its \ ertsces. A hierarchical tree as in 
Boetje is a complete ordering ofthe vertices. 1 he Boetje iree is to 
automatical!) detect inconsistencies in a schedule progtammed by a 
progtammer. and not to determine similar interpretations of multimedia 
content as claimed, see. 

(co/. 33, line 40-coL 34, line 62; and see figs. 22a-b and the associated text). 

The invention measures attributes of content entities that include intensity 
attributes Yeo, at column 7, lines 35. et seq.. measures amvltiiums between 
frames as differences: 
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The hneniois tlise/o\ued ih.il meaMning eoire laiion 
Ktwt.cn two imaiL images (even the DC image-*) does give 
a very good indication ol similarity (it 1^ actually dissimi- 
larly in the definition below ) in these images By using the 
sum of absolute difference, the correlation between two 
images. f /M and \\ is commonly computed by 4t 

/ K (m 
/VI A^i 1 

Yeo does not measure attributes of content entities thai include direction 
attrihuses. Yeo measures, in linages, the ""two-dimensional moment invariant 
of the luminance." Those of ordinary skill in the art would not confuse 
direction and luminance, see column 7. lines 1 3-1.9. 

Yeo does not measure attributes of conreni entities that include spatial 
attributes and the order is spatial.. The Yeo measure-men! s take place on 
frames. 

Yeo does a temporal segmentation, but: never measures temporal attributes of 
content entities. 

Yeo does not rank order attributes measured of content: entities. 

The scene transition graph in Yeo is not derived form video object planes. 

There is nothing in column 1.9 that would indicate that: Yeo generates a 
summary of a video. 



9 



ATI. -0-1 2 
Di\ akiiniiu-l af 



vMuon Ixsv.^ smhvuhsat s|xot> alkxvutt 01 -> cH^ur Hi k-*t 
ru 5 s ><} :he pfk."-ti« ^yskni, jivt isV n^iwl *.lvs p.ttuu ms 

5^. UM.I *<[iH IftiJstn -Jijlth iJpM UK ktRiiK l- < vlkui^ 

thcMc f\trfitn<s^ vsol i -^'ivr^'j^ tu^ihv 4k s> vxttfc low 
sr ri U\sr \ik !( tt!-*K I U >^ A; .muLI'-^va Jsku rs^ fx susi\ 
on Um< \vqtiiJ5i.\^ a \n*m lkm>vrasv s onx-cnron 
sxoV- s^sh*^ ;*>5<J a lUs^st rcN,H\lsvth !!k 

t^n.vfcvt ncuv im^-titos-* i»f,*{^ i\ biO out t^ftg she .%o- 

"SHussiiif i,noi>t "ig.t! ishfts t>« J'spbx ^caps*-*", sss 

, K ii N 4 hIS iw t n <■ i [K t U" iu\ >1 i f 
l.lY(M*5 H thv. K\n ^Wx-tfk'sk^fHsi Vdko h,*vai 

ufH.vaK t i>oikvS«-f* W ^hoK ^hjsuo*! fc< thi itKth^l 

5lJ.'J.N*Ul »SK >. > ^t!> M s-1 sho ■> \ -IK (•> JL.SO |.<->-Ak\^ 

lo* Sin. jv- i>- u-juaugc iUt ikkK^ i>s>^ Uk n»^\ 

tt^'QKt to '"otai tmlhct ohiNKi-', j mi «tn\f *^?k \b<<K 

lo ul»t fv. ri-unk-, \ (■».! hh, >itA ik 

i tUi nh i .n *x u M tu f « < tb Mn!l 

ssructuu-w 



Colum ns 6 though 8 also do not describe a video summary. Appl icants 
respectfully request the Examiner to point out which word(s) in Yeo mean "a 
video summary." The Applicants have carefully read Yeo but cannot find any 
video summarization steps. 



At column 7, Yeo states: 

Shape 

ibt, |\vv, jt sx Mu ^ um,n >is a ,otf t s«,asua ol s nl,^r^^ 
Kuxoer 'x-vo sruagos .'-^ *\vo dinetwuniai n un cij f jwar .ar t 
oMht hmiuuKtv Ik \X(.\v% i k i }x.u,ioi^^^<^eml.ru , \h.\ ik 15 
\\<Je ina^'j JtJiks ni Silhw.ii 'ik^nt tl LX<jnar{ x.nx 

i<< tL<~ t ii,d us h Mlh \ o'jKtt is vuiunL w.ui x a \ n N<,M.taj 
o;tk;s ttuigritdds. 

Hv j^ ai g {h^. j j<_L.k il JaIvSikc o! tL* i<.\-u cti\v iudji u t ~'- ! 
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There is absolutely nothing there that would suggest that a measure of 
similarity between two-dimensional luminance would suggest a three 
dimensional video. A three dimensional video is a video that also includes 
depth, such as a M.R.I, video or CAT scan. 



Claimed are directed acyclic graphs where nodes represent the content 
entities and edges represent breaks in the segmentation, and the measured 
attributes are associated with the corresponding edges. Yeo teaches a graph 
"with nodes representing scenes and edges representing the progress of the 
story from one scene to the next." 



Claimed is at least one secondary content entity associated with a particular 
content entity, and wherein the secondary content entity is selected during 
the traversing. Nowhere in columns 2-6 are these limitations described. 



Claimed is a summary of the multimedia with a selected pemmtatioti of the 
content entities according to the associated ranks. At columns 9: 

Hit u-* sjiunlx lliinluiimhi!, nlu 'h 

values .between individual shots, allowed hi * duster 1st test 
trials at' ibe jwcst.ni system, after the initial shot partitions. 
Hi* user txiiy mstfs to shgftUy adjust fire torc*s to change 
pariiisojiK to yield satisfactory results, oiten w«b less 
tiwu tour m-.li frisk, HUS. As ,wd 3iM».iw diistemsu resnSte j« 

video sequence, and a News Report, rcm-ctivciv. J he 

rtfhcas disclosed by Laagki Sxirniay-iiafos, in a paper 
"Dynamic layout algorithm to display gtttwal graphs". t» ■>■■• 
Graphics Gem IV, pp. 505-517. Aea&xmc l ! re.«. Boston, 
l l .>U4. f iGS. 4 ami 5 xiww tl.it: sitnrpk interface ami graph 
layout «i: the two afxwfc-mwtttotsid vistas sequences, based 
ot» ttw w-sidis m HGSS. 3t» and 3£>, respectively. Each Bode 
represents a <j«!iai;t.ton of. sbuis, t&tstered bv the roediorf >s 
described abov«. For simplicity, only oae frame is used to 
Kpfv» it tiiv <x>U> uoa v ; moN \ me ins i> tl* provd. d 

togf.ihtu' to imm fortbtsr clusters, aixi un&riwtp sonic shuts 
itttiii j t't *>fi.r 1 n.a, ^ tlx uv. ti i v ii » ,h, ^t-ipfis 
cjii&rwtiiy to get a beBer understanding of the overall 
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Yeo allows (he user to rearrange nodes in a graph. There is nothing there Jo 
indicate that content entities can be permuted according to rank. 



It is believed {.hat this application is now in condition for allowance. A 
notice to this effect is respectfully requested. Should further questions arise 
concerning this application, the Examiner is invised to call Applicants' 
attorney a{ the number listed below. Please charge any shortage in fees due 
in. connection with the filing of this paper to Deposit Account 50-0749. 



Re spec if u 1 K s u bm t tt ed . 

Mitsubishi Electric Research Laboratories, Inc. 
By 

Dirk Brinkman 



201. Broadway, .8 th Floor 
Cambridge, MA 02 139 
Telephone; (617) 621-7539 
Customer No. 022199 



Dirk Brmkman 

Auorney for the Assignee 

Reg. No. 35.460 
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